Big Data Management with Incremental K-Means Trees-GPU-Accelerated Construction and Visualization

نویسندگان

  • Jun Wang
  • Alla Zelenyuk
  • Dan Imre
  • Klaus Mueller
چکیده

While big data is revolutionizing scientific research, the tasks of data management and analytics are becoming more challenging than ever. One way to remit the difficulty is to obtain the multilevel hierarchy embedded in the data. Knowing the hierarchy enables not only the revelation of the nature of the data, it is also often the first step in big data analytics. However, current algorithms for learning the hierarchy are typically not scalable to large volumes of data with high dimensionality. To tackle this challenge, in this paper, we propose a new scalable approach for constructing the tree structure from data. Our method builds the tree in a bottom-up manner, with adapted incremental k-means. By referencing the distribution of point distances, one can flexibly control the height of the tree and the branching of each node. Dimension reduction is also conducted as a pre-process, to further boost the computing efficiency. The algorithm takes a parallel design and is implemented with CUDA (Compute Unified Device Architecture), so that it can be efficiently applied to big data. We test the algorithm with two real-world datasets, and the results are visualized with extended circular dendrograms and other visualization techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Superconductor: GPU-accelerated Big Data Visualization for the Browser

There is need for data visualization tools which are both scalable and productive We show how previous work on parallel schedule synthesis for attribute grammars can be extended to this domain. Our results retain the flexibility of common tools in this area, while handling up to two magnitudes more data.

متن کامل

AVIST: A GPU-Centric Design for Visual Exploration of Large Multidimensional Datasets

This paper presents the Animated VISualization Tool (AVIST), an exploration-oriented data visualization tool that enables rapidly exploring and filtering large time series multidimensional datasets. AVIST highlights interactive data exploration by revealing fine data details. This is achieved through the use of animation and cross-filtering interactions. To support interactive exploration of bi...

متن کامل

Designing a smart algorithm for determining stock exchange signals by data mining

One of the most important problems in modern finance is finding efficient ways to summarize and visualize the stock exchange market. This research proposes a smart algorithm by means of valuable big data that is generated by stock exchange market and different kinds of methodology to present a smart model.In this paper, we investigate relationships between the data and access to their lat...

متن کامل

Distributed GPU-Based K-Means Algorithm for Data-Intensive Applications: Large-Sized Image Segmentation Case

K-means is a compute-intensive iterative algorithm. Its use in a complex scenario is cumbersome, specifically in data-intensive applications. In order to accelerate the K-means running time for data-intensive application, such as large sized image segmentation, we use a distributed multi-agent system accelerated by GPUs. In this K-means version, the input image data are divided into subsets of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Informatics

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2017